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Abstract 

Multi-user multi-input-multi-output (MU-MIMO) systems transmit data to multiple users simul- 
taneously using the spatial degrees of freedom with user feedback channel state information (CSI). 
Most of the existing Uteratures on the reduced feedback user scheduling focus on the throughput 
performance and the user queueing delay is usually ignored. As the delay is very important for 
real-time appUcations, a low feedback queue-aware user scheduling algorithm is desired for the 
MU-MIMO system. This paper proposed a two-stage queue-aware user scheduUng algorithm, which 
consists of a queue-aware mobile-driven feedback filtering stage and a SINR-based user scheduUng 
stage, where the feedback filtering pohcy is obtained from the solution of an optimization problem. 
We evaluate the queueing performance of the proposed scheduUng algorithm by using the sample 
path large deviation analysis. We show that the large deviation decay rate for the proposed algorithm 
is much larger than that of the CSI-only user scheduUng algorithm. The numerical results also 
demonstrate that the proposed algorithm performs much better than the CSI-only algorithm requiring 
only a small amount of feedback. 
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I. Introduction 

MIMO is an important core technology for next generation wireless systems. In particular, in multi- 
user MIMO (MU-MIMO) systems, a base station (BS) (with M transmit antennas) communicates 
with multiple mobile users simultaneously using the spatial degrees of freedom at the expense of 
knowledge of channel states at the transmitter (CSIT). It is shown in |[T|, Q that using simple zero- 
forcing precoder and near orthogonal user selection, a sum rate of M log log K can be achieved with 
full CSIT knowledge. Yet, full CSIT knowledge is difficult to achieve in practice and there are a lot 
of works focusing on reducing the feedback overhead in MIMO systems |[3|-|[8|. For instance, in |3J, 
Q, the authors have focused on the codebook design and performance analysis under limited-rate 
feedback schemes. In ||5|-|[7|, on the other hand, a threshold based feedback control is adopted where 
users attempt to feedback only when its channel quality exceeds a threshold. It was further shown that 
a sum rate capacity 0{M\og\ogK) can be achieved when only O (M log log log i^) users feeding 
back to the BS Q. 

While there are a lot of works that consider reduced feedback design for MU-MIMO, all these 
existing works focused on the throughput performance. They have assumed infinite backlog at the 
base station and therefore, ignored the bursty arrival of the data source as well as the associated delay 
performance, which is very important for real-time applications. For instance, the CSI information 
indicates good opportunity to transmit whereas the Queue State Information (QSI) indicates the 
urgency of the data flow. A delay-aware MU-MIMO system should incorporate both the CSI and QSI 
in the user scheduling. However, it is far from trivial to integrate these information in determining the 
user priority. There are some works considering QSI in the user scheduling of MU-MIMO systems. In 
|[9|, the author considered a queue-aware power control and dynamic clustering in downlink MIMO 



systems. In |10|, the authors considered MU-MIMO user scheduling to maximize queue-weighted 
sum rate. Due to the exponentially large solution space, heuristic greedy-based algorithm is proposed. 
However, these works required the BS to have global CSI knowledge of all the users, which is hard 
to achieve in practice. Furthermore, the delay performance in [lOj is obtained by simulation only and 
not much design insights can be obtained in these works. In general, there are still a number of first 
order technical challenges associated with designing delay-aware MU-MIMO systems. 
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Challenges in User Scheduling Design: For real-time applications, it is important to exploit 
CSI and QSI in the user scheduling. Yet, it is highly non-trivial to design a priority metric 
that strike a balance between transmission opportunity and urgency. A brute-force stochastic 
optimization approach such as MDP pT| , p2| will end up having huge complexity solution 
(exponential w.r.t. K), which is highly impractical. On the other hand, brute-force application 
of Lyapunov optimization techniques |13] in MU-MIMO is also not feasible because of the 
associated exponential complexity of user selection for MU-MIMO. 

Challenges in Delay Analysis: Due to the QSI-aware control algorithm, the service rate of the 
data queues are state-dependent and the queue dynamics from these K data flows are coupled 
together. This makes the queueing delay analysis extremely difficult. There is no closed form 
results on the steady state distributions of the queue length in such complex queueing systems. 



In 1 14 1, the authors characterized the stability region of the MU-MIMO systems under limited 
CSI feedback. Yet, stability is only a weak form of delay performance. 
In this paper, we consider a MU-MIMO downlink system with a Af-antenna BS and K multi- 
antenna mobile users. The BS applies the random beamforming for MU-MIMO to exploit the multi- 
user diversity. To overcome the complexity challenge of user scheduling, we propose a two timescale 
delay-aware user scheduling policy for the MU-MIMO system. The proposed policy consists of two 
stages, namely the queue-aware user-driven feedback filtering stage and the dynamic SINR-based 
user scheduling stage. At the first stage (slower timescale), the BS broadcasts a QSI-dependent user 
feedback candidate list and only mobiles in the list are allowed to feedback the CSI to the BS. At 
the second stage (faster timescale), the BS selects the strongest users based on the CSI of the users 
selected in the first stage. Based on the two timescale user scheduling policy, we then analyze the delay 
performance of the MU-MIMO system. It is in general difficult to analyze the delay for state-dependent 
coupled queues. To overcome this challenge, we consider the large deviation tail for the maximum 
queue length among all the users, which reflects the worse case delay performance in the system. 



Using large deviation theory for random process |15|, we derive the asymptotic exponential decay 
rate for the tail probability of the maximum queue length. Specifically, we quantify the asymptotic 
decay rate — ^ log(P(maxfc Qk) > B) as the buffer size B ^ oo. We show that the decay rate of 
the worst case queue length of the proposed delay-aware scheduling algorithm scales as OilogK), 
which is substantially better than traditional MU-MIMO user scheduling baseline schemes. 

The rest of the paper is organized as follows. We present the system model, bursty data source and 
queueing model and the proposed two timescale delay-aware user scheduling policy in Section |ll] In 
Section [Illj we derive the optimal user-driven feedback filtering strategy using Lyapunov approach. 
We then analyze the maximum queue length property using sample path fluid approximation and 
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large deviation theory in Section IV Numerical results are provided in Section |V] and we conclude 
the results in Section IVTl 

Notations: f{x) = 0{g{x)) denotes lim^^oo ^ < oo, f{x) = o{g{x)) denotes lim^_^oo ^ = 0, 
and 'Ex[f{x,y)] = J f{x,y)dFx{x) denotes the expectation over random variable x (treating y as 
constant). 

II. System Model 

A. MU-MIMO System Model 

We consider a downlink MU-MIMO system with a M-antenna BS and K geometrically dispersed 
mobile users (K ^ M). Each mobile user has receive antennas. Using MU-MIMO techniques, 
the BS transmits M data streams to a group of selected users at each time slot. The wireless channel 
between each user and the BS is modeled as a Rayleigh fading channel. Specifically, the received 
signal Yk G C^^^ by the user k is given by 

Yk = VPHk^ + rifc VA: G A{t) (1) 

where x G c^^xi is the normalized transmitted signal with E[Tr(xx*)] = M, i.e., the normalized 
transmit power on each antenna is assumed to be one, G C^^^ is the zero mean, unit- variance 
circularly symmetric complex Gaussian channel matrix from the transmitter to the user k, G 
£,Nxi ^ CJ\f{0,I]\j) is the Gaussian additive noise vector, P is the transmit power at the BS, and 
A{t) denotes the set of the scheduled users at time slot t. We have the following assumption on the 
channel matrices {Hk}. 

Assumption 1 (Assumptions on Channel Matrices): The channels are assumed to be in quasi-static 
block fading, where each channel realization remains constant during each time slot, but identically 
and independently distributed (i.i.d.) across different time slots. The mobile users are assumed to have 
perfect knowledge of their local CSI. However, only a selected portion of the users will feedback 
their CSI to the BS and the feedback information is delivered through a noiseless feedback channel. 
■ 

At the BS, random beamforming is used to support near-orthogonal data streams transmissions 
to the selected users without knowing the full CSI {Hk}. The BS chooses M random orthonormal 
vectors {4>i, . . . , 4>m}, where G C^^^^ are generated according to an isotropic distribution. Let 
s{t) = (si(t), . . . , SM{t)) be the vector of the transmit symbols. The transmit signal is given by 

M 

x{t) = ^ (pmSmit). 

m=l 

Therefore, the receive signal at the k-th user is 

M 

yk{t) = ^Hk(t> 

m=l 
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We assume the receivers know the beamforming vectors {cpm}- The effective SINK of the i-th beam 
on the n-th receive antenna of the k-th user can be calculated as follows, 



—-2 • (2) 



(n) 

where Hf, denotes the n-th row of the channel matrix of user k. By selecting the users with 
the highest SINR on each beam, the transmitter can support near-orthogonal transmission and exploit 



multi-user diversity without the global CSI {Hk} |16|. 



B. Bursty Data Source and Queue Model 

Data arrives in packets randomly for different users. Let Ak{t) denote the number of packets that 
arrive at the BS for user k during time slot t, and A(t) = {Ai{t), . . . ,Ak{t)). We assume that the 
arrivals Ak{t) are i.i.d over different time slot t. We have the following assumptions regarding the 
bursty arrival processes Ak{t). 

Assumption 2 (Bursty Source Model): The packet arrival Afc(t) is i.i.d. with respect to (w.r.t.) t 
and independent w.r.t. k according to a general distribution with mean E[Afc(t)] = and moment 
generating function (MGF) Ayi fc(0) = E [e^"^*-]. The packet length is assumed to be constant L bits. 
■ 

The BS maintains queueing backlogs Qk{i) for each user k. Let Dk{Q{t),'H.{t)) represents the 
amount of departure for user k at time slot t, where Q(t) = {Qi{t), . . . ,QK{t)) and H(f) = 
{Hi{t), . . . jHxit)). The queueing dynamics for user k is given by 

Qk{t + 1) = [Qkit) - Z)fc(Q(t), H(t))]+ + Ak{t) (3) 



where the operator represents [w]~^ = max.{0,w}. Using Little's Law |17|, the average delay of 
the A;-th user is given by = Qj^/D^, where is the average backlog for the k-th queue and 
is the average departure at each time slot. As a result, there is no loss of generality to study the queue 
length Qk for the purpose of understanding the delay. Obviously, the queue length (or the delay) of 
the MU-MIMO system depends on how we use the channel resources. Hence the goal of the user 
scheduling controller is to adjust the channel access opportunity for all the users so that their queue 
lengths (or delay) are minimized while a reasonable system throughput is maintained. 



C. Two Timescale User Scheduling with Reduced Feedback for MU-MIMO 

A reasonable delay-aware user scheduling algorithm should jointly adapt to both the CSI (to capture 
good transmission opportunity) and the QSI (to capture the urgency). In particular, we are interested 
in the control poUcy that can maximize queue stabiUty region. However, conventional throughput 
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optimal (in stability sense) user scheduling policies such as max-weighted-queue (MWQ) algorithms 
p3| , p4| require global CSI and QSI knowledge. However, the CSI is available at the mobile user side 
while the QSI is available at the BS. Furthermore, the MWQ policy requires solving a queue weighted 
sum rate combinatorial optimization problem, which has exponential searching space. Hence, brute 
force solution of the MWQ problem requires huge signaling overhead as well as huge complexity. 
To overcome these challenges, we propose a two timescale user scheduling solution as follows. 

• Stage I: Queue-aware user-driven feedback filtering. The BS determines and broadcasts the user 
feedback probability {pi{Q), . ■ -Pi^rCQ)} based on the user queueing backlogs Q(i) for every 
T time slots. Mobile user k attempts to feedback to the BS in the stage II with probability p^. 
We denote Xk £ {0, 1} as the stochastic feedback filtering policy with P(xfc = 1) = Pfc> and a 
user k feeds back when Xfc(i) = 1- The motivation of the mobile feedback filtering is to save 
the feedback cost by reducing the lower priority users from feeding back. 

• Stage II: Dynamic User Scheduling based on SINR feedbacks. If the feedback filtering policy 
Xk = 1» then user k measures the effective SINR vector {SINR^ ni ■ • • ' SINR^^„} on each receive 
antenna n according to ^ and finds the strongest beam i*{k,n) = argmaxi<j<A/ SINR^ ^. 
The mobile then feeds back the selected beam index i*{k,n) and the associated SINR^I^^^'"^ to 
the BS. The set of feedback users at time slot t is denoted by T{t). The BS schedules user 
k*{i) to transmit at the i-th beam who has the highest SINR, i.e., k*{i) = argmaxfcgjr^j) 7^, 
where 7^ = max„g;^(fc j) SINR^ „ denotes the highest SINR of user k on the i-th beam. Here 
J\f{k,i) = {n : 1 < n < N,i*{k,n) = i} denotes the set of antennas of user k which have fed 
back the SINR for the i-th bearrQ As a result, the stage II user scheduling exploits the multi-user 
diversity among the set of users attempting to feedback J^{t). 

Fig. [T] depicts an illustration of the two stages user scheduling policy. The policy tries to balance 
the transmission opportunity and urgency with a low complexity and low feedback cost strategy. For 
the user with a long queue, it will be given priority to feedback during the stage I feedback filtering 
phase. Users who have passed the stage I filtering will compete for channel access based on the stage 
II SINR based scheduling in which users with better channel conditions will be served. Moreover, 
the two stages processing can be implemented on different timescale. The user selection in stage II is 
done at every time slot t, while the user feedback probability {pk{Q)} determined in stage I can be 
updated once every T time slots. The update period T trades the performance of the two timescale 
policy with the control signaling overhead. With a large T, there is a smaller signaling overhead 
associated with broadcasting {pk{Q)} in stage I but the feedback priority may be driven by outdated 
QSI. 

'We define 7^ = if Af{k, i) = 0. 
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Figure 1. The two stage joint CSI and QSI user sclieduling in a multi-user MIMO system. At stage I, thie BS determines 
tlie user feedback priority based on thie QSI. At stage II, a portion of selected users feedback their CSI and the BS schedules 
users for transmission based on their CSI feedback. 



D. Queue-Aware Feedback Filtering (Stage I) Optimization 

The feedback filtering control in stage I plays a critical role in the overall delay performance of 
the MU-MIMO system. In the following, we adopt a Lyapunov optimization technique to derive the 
stage I feedback filtering policy to achieve the maximum queue stability region in the MU-MIMO 
system. 

1 ) Queue Stability : We first define the queue stability and the stability region formally below. 
Definition 1 (Queue Stability): The queueing system is called stable if 



lim sup -E 

t— 5>00 t 



max Qk{t) 

k 



< CO. 



Definition 2 (Stability region and Throughput Optimal Policy): The stability region C is the clo- 
sure of the set of all the arrival rate vectors {Xk} that can be stabilized in a MU-MIMO system, 
using the two timescale user scheduling (Stage I and Stage II) framework in Section [Tl-Cj A throughput 
optimal user scheduling policy is a policy that stabilizes all the arrival rate vectors {A^} within the 
stability region C. ■ 

2) The Data Rate and the Amount of Feedback: Let E {0, 1} be the scheduling indicator of the 
k-th user on the i-th beam. Note that the scheduling indicator J^(H,x) is a deterministic function 
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of H and x- Therefore, the data rate for user k is given by 

M 

Rk{U,x) = ^Jixklog{l+ll) (4) 

i=l 

where 7^ is the SINR of user k on the i-th beam after the Stage II user scheduUng. 
We define the conditional feedback cost S{Q) and the average feedback cost S as follows, 



S{Q)=E 



J^PkiQ), and S = E[Sm- (5) 

k 

In addition, the minimum average feedback cost to achieve the maximum queue stabiUty region C in 
the MU-MIMO system is denoted as S* . 

3 ) The Feedback Filtering Optimization: The feedback filtering control policy is derived from the 
Lyapunov technique and is shown to be throughput optimal as follows. 

Define L(Q) = Q\ as the Lyapunov function. Then the one-step conditional Lyapunov drift 
AL(Q(t)) is given by, 

AL(Q(t)) ^ E[L(Q(f + l)-L(Q(t))|Q(t)]. (6) 

The following lemma establishes the relationship between the Lyapunov drift ([6]) and the queue 
stability. 

Lemma 1 (Lyapunov drift and the queue stability): Given positive constants V and e, the K queues 
of the MU-MIMO system {Qi{t), ... , Qxit)} are stable if the following condition is satisfied, 

AL{Q{t)) + VS{Q{t)\Q{t)) < BK - eY,Qk{t) + VS* (7) 

k 

for all t and all Q(t). The average queue length satisfies 

Y^Q,^ lim sup ^J:^E mr)] < ^]^±X^ 

k T=0 k 

and the average feedback cost satisfies 



(8) 



S = lim sup - 5(Q(r)) < 5* + BK/V. 



(9) 



Proof: The proof can be extended from from |[T8j Lemma 1] by replacing the power cost function 
with the feedback cost function S{Q) defined in ■ 

The results in Lemma [T] motivate us to minimize the Lyapunov drift in (|7]) to achieve the maximum 
queue stability region. With this insight, we develop our feedback filtering control policy as follows. 

Feedback Filtering Control Algorithm (FFCA): Observing the current queue length Q(t), users 
feedback their CSI according to the probability vector p*(Q(t)) = {pl{Q{t)), . . . ,p*j^{Q{t))}, where 
p*(Q(t)) is obtained from the solution of the following optimization problem, 

(10) 



max E 



Y,Qk{t)Rk{il,x) -VS{Q{t)) 



k=l 
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The following theorem justifies the throughput optimality of the feedback filtering policy obtained in 



(lOl 



Theorem 1 (Throughput optimality of the FFCA): The feedback control p*(Q) given by FFCA 
achieves the maximum stability region C in the MU-MIMO system. ■ 
Proof: Please refer to Appendix [A] for the proof. ■ 

From the results in Lemma [T] the parameter V in ( [T0| ) trades off the average queue length (delay) 
and the feedback cost. A large parameter V reduces the average feedback cost in (|9]l but results in a 
larger average queue length In the next section, we shall derive the FFCA solution p*(Q). Note 
that due to the feedback filtering variable x G {0, 1}^, we have an exponential complexity (w.r.t. K) 



to evaluate the expectation in ( 1 1 1. This makes the problem difficult to solve. However, by exploiting 



specific problem structure, we are still able to find the global optimal solutions to the problem ( 10 1. 

III. The Queue- Aware User Feedback Filtering Algorithm 
In this section, we shall focus on deriving the FFCA solution for feedback filtering problem in 



( |T0| ). Using primal decomposition techniques, problem ( [TO] ) can be transformed into the following 
two subproblems 
• Inner subproblem: 



Vj{S) = max E 

{Pk} 



(11) 



subject to <pk<l, yk = l,...,K (12) 

EtiPk = S (13) 

where S is an auxiliary variable with the physical meaning of the average number of feedback 
users due to constraint ([T3]l. 
Outer subproblem: 

Vii = max Vi{S) - VS. (14) 



In the following, we first derive the objective function in ( 11 1. We then solve the inner subproblem 



(111 and outer subproblem ( 14) separately. 



A. The Average Data Rate of the Feedback Users 



In this section, we are interested in the expected user data rate E [Rk] (also denoted as Rk) in ( 1 1 
Define rik{S) = E [i?fc(H, x)|xfc = l,X]fcXfc = as the average data rate for user k, conditioned 
on S users feedback to the BS (including user k). We characterize r]k{S) in the following lemma. 

Lemma 2 (Data rate under deterministic feedback): Given the set of feedback users J^, where 
\J^\ = S, we have for k ^ F, 

POO 

Vk{S) < M / log(l + x)Nfix)Fixf^-^dx ^ fjkiS) (15) 

JO 
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Figure 2. Comparisons between the numerical value rjk (S) and its theoretical upper bound fik{S) for k e T. channel 
realizations were run to estimate rik{S). As observed, r]k{S) ~ fik{S) even for moderate number of users S. 



where 

is the cumulative distribution function (CDF) of SINRJ, ^ in (|2j) and f{x) is the corresponding 
probabiUty distribution function (PDF). 
Moreover, the upper bound is tight as 

-i/px 



Av = \m{S)-m{S)\<{i- 



Proof: Please refer to Appendix |B] for the proof. ■ 
Fig. |2] illustrates the comparisons between 'r]k{S) and %(S') for different number of users 5. As 
observed, 77^(5') ~ Vk{S) even for moderate number of users S. 

B. Solution of the FFCA 

1 ) Solution to the inner subproblem: In the following, we shall utilize Lemma [2] to solve for the 
inner problem. Let 11 = {vr(l), . . . , vr(i^)} be a permutation of Q such that (5,7(1) > Q-n{2) > • • • > 
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Qtt{K)- Note that the distribution of the binary random variable Xk is specified by pk, and we have 
E [xk] = Pk- Given the average feedback amount E Xk] = "^kPk — FFCA solution p of 



the problem (111 is summarized in the following theorem. 

Theorem 2 (The optimal user feedback probability {pk}): The FFCA user feedback probability {pk} 



to solve (11) is given by 



Pirik) 
P-K{ko) 
Pwik) 



1, 1 < A; < [5J 

S-[S\, ko= [S\ + 1 
0, otherwise. 



and the average data rate for user A; < A;o is 

Rk = m{[S\){l-{S-[S\)) + rjk{[S\+l){S-[S\) 

Rko = %,(L5j + i)(5-L5j). 



(17) 
(18) 
(19) 



Proof: Please refer to Appendix [C] for the proof. ■ 
The above result shows that given the constraint on the average number of feedback users 5, the 
best strategy is to let the users with the S largest queues to feedback, while keeping other users 
inactive. 



2) Solution to the outer subproblem: Based on the solution of the inner subproblem (11), we are 



now ready to solve the outer subproblem (14i and determine the average feedback cost S*. Using 
the results for Rk in Theorem [l] we obtain the objective function of the outer subproblem (14) as 
follows 



U{S) 



E 



^ Q7r(fc)-R7r(fe)IX7r(fco) " 



fc=l 



+E 



L5J+1 

(fc)^7r{fc)IX7r(fca) " ^ 
k=l 



l-P^(fco)) 

Piriko) - ys 



k=l 



L5J+1 



+ E QAkmk){ls\ + i){s- is\)- vs. 



k=l 



Note that U (S) is a continuous function on S. The following result shows that it suffices to consider 
only integer value of S. 



Lemma 3 (Property of the outer subproblem): The solution S* to the outer subproblem (14) is an 
integer. ■ 
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Algorithm 1 Proposed algorithm to find user feedback filtering policy on stage I. 

1) Initialization: S = [^J. 5min = 1, 5'max = K. 

2) Evaluate the condition in ([IT}. If U{S*) > U{S* - 1), then S'min = S. Otherwise, 5max = S. 

3) Repeat |2]) by setting S = [(S'min + S'max)/2j, until S'max - S'min < 1- 

4) Find the optimal user feedback probability vector p according to ( 17) in Theorem [2| by setting 
S = S* found from the above. The user feedback filtering policy on stage I is thus determined. 



Proof: Please refer to Appendix [D] for the proof. 
With Lemma |3] the outer subproblem in (14) becomes 



max U{S) = Y.k=i Q7Tik)Vnik){S) - VS 



(20) 



subject to 



S = {1,...,K}. 



An intuitive observation of the outer subproblem (20 1 is that, while the term X]fe=i Qn{k) ir* <20) 
is increasing with S, and the terms r]{S) and —VS are decreasing, the objective U (S) should first 
increase and then decrease after it reaches U{S*). This motivates us to use a bisection algorithm 
(summarized in Algorithm [T]) to solve the outer subproblem (20 1, which takes maximum log2(i^) 
steps to find S*. The following theorem guarantees that the bisection algorithm finds the global 



maximum of the outer subproblem (20 1. 



Theorem 3 (Global optimal solution to {10)): The global optimal solution to (10 1 is uniquely de- 
termined by the following conditions 



U{S*) > U{S* + 1) and U{S*) > UiS* - 1) 



(21) 



where S* € {1,...,K}. ■ 
Proof: Please refer to Appendix |E] for the proof. ■ 
Using with Theorem [2] and [3j the two timescale user scheduling algorithm can be summarized as 



follows. We first determine the optimal user feedback amount S* by solving (20) using Algorithm 
[T] We then choose S* users who have the longest queues among all the K users to be eligible to 



feedback the BS, and this feedback filtering decision {p|.(Q)} in (17i is broadcasted to the network. 
As a result, the users feedback their effective SINRs based on {p^(Q)} and the BS schedules the 
users based on their SINR as described in the stage II policy. 



IV. Large Deviation Delay Analysis for the Worst Case User 

In this section, we will study the queueing delay performance of the proposed solution and illustrate 
the gain of having queue-aware policy. We are interested in the steady state distribution of the worst 
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case queueing performance, i.e., 



lim Pr( max Qk(t) > B) 

t^oo ^l<k<K 



where B is the buffer size. We denote Qmax(i) = max^ Qkit) as the maximum queue length process 
and Qmax(oo) as the steady state of the Qma.x{t). To overcome the technical challenges associated 
with delay analysis of MU-MIMO system, we consider the large deviation approach p9| . Specifically, 
we focus on the asymptotic overflow probability for the maximum queue (5max(c«) over a large buffer 
size B, which is captured by the large deviation decay rate of the tail probability of Qmnxioo). In 
the next section, we shall introduce the decay rate function for (5max(c«). 

A. Large Deviation Decay Rate for (5max(oo) Using Sample Path Analysis 

The large deviation decay rate function /* for the tail probabiUty of Qmax(oo) is defined as 

r 4 lim -4 logPr (Qmax(oo) > B) . (22) 

Note that, with the notion of the large deviation rate function, the queue overflow probability can be 
written as 

Pr(gmax(oo) >B) = e-^*^+°(^) (23) 

where the component /* controls how fast the queue overflow probability drops when the buffer size 
B grows. A larger decay rate /* corresponds to a better performance of the scheduling algorithm in 
the sense of reducing the worst case delay Qmax in the system. In the following, we shall find the 
decay rate function /*. 

Consider a scaled sample path Qmaxl*) = ^Qma.x{yBt\), which starts from (7max(0) = ^^d 
reaches q^^^{Ts) = 1, for some Tg. Note that with the scaling, we have Pr(Qmax(oo) > -B) = 

(9max(oo) > l)- Let w{t) be a continuous sample path following (7max(0' ^(*) ^ 9max(*)- 
Computing the decay rate /* corresponds to finding a "most likely" path w{t) that overflows at 



w{Ts) = 1. Using the large deviation principle |19|, the decay rate function /* can be found as 
follows 

r = inf 1^ ' 1{w{t),w {T))dT : u;(0) = 0, u;(r,) = l,r, > 



where l{w{t),w' {t)) defined in (38l is the local rate function |19j| following the path w{t) (see 
Appendix [G]). 

Solving the above variational calculus problem we obtain the results as follows. Denote /ifc(x) = 
^^^^1^ for k £ where x = (7max(*) for < t < Tg. Note that from ( 15l, fik{x) is independent of k 



and thus we write fi{x) = fik{x)- Consider the arrivals to all the users are i.i.d. with mean E [Ak] = A 
and logarithm moment generating function gyi(^) = logA^ fc(0). The following theorem summarizes 
the large deviation decay rate for Qmax(oo) under the proposed two timescale algorithm. 
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Theorem 4 (The large deviation decay rate for Qma.x{oo))' Suppose X/fj,(x) < 1 for all x in [0, 1]. 

Then the large deviation decay rate for Qmax(oo) is given by 

1 

e*{x)dx (24) 







where 9*{x) solves 

g{x,e)=gA{e) + fi{x) (e-'-l) =0. 

■ 

Proof: Please refer to Appendix |G] for the proof. ■ 
The result in Theorem |4] gives the rate function to evaluate the exponential decay rate of the overflow 
probability Pr((5max(oo) > B) under the proposed two timescale user scheduling algorithm. Based 
on this result, we shall derive more insights in the later sections. 

B. Approximation of the Average User Data Rate 



From the expression of the rate function /* in (24i, it is still hard to understand how the delay 
performance relates to the system parameters. To further analyze /* , we need a closed form expression 
for the average data rate for each user. In this section, we shall derive an asymptotically accurate 
approximation for R^. 

Lemma 4 (Asymptotic analysis of Rk): The average data rate Rj. iox k ^ F has the following 
property, 

lim — ^, — = 1. (25) 

5^^ Pk^\og{P\ogNS) 

■ 

Proof: This is a direct result of ||2j Theorem 1] by considering n = NS users each with single 
receive antenna. ■ 
Lemma [4] shows that we can use R^ ^ Pk^ log (Plog A^5) as an asymptotically accurate approx- 
imation of the average data rate, when S is large. Using the approximated Rk, we find an upper 
bound for S* in the following. 



Lemma 5 (Upper bound of S*): The upper bound of S* which solves ( [36] ) is given by 

S* (Qma. ; K) < min I ^ e^^'^^ ) , I (26) 

where ci = MJ^Qj:^^ and W{x) is the Lambert W function |20| defined as W{x)e^^^^ = x. The 
equality holds when (^^^(fc) = Qmax for all k. ■ 
Proof: Please refer to Appendix [F] for the proof. ■ 
Remark 1 (Interpretation of S*): The results show that, when Qmax is large, it is better to have 
more user feedback to boost up the system throughput. On the other hand, when Qmax is small, we 
can have less user feedback and give higher priorities to the urgent users. Note that according to 
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Lemma [ij the average data rate Rk is a decreasing function of S. Thus the upper bound of S* gives 
a lower bound of Rk, i.e., 



log PlogiV5*(Q„,ax) 
i?fc(Qmax) >Pfc(5*(Qn,ax))M ^ . ^ ^ (27) 

where S*{Q^^^) ^ ^ exp )) and Pk{S* {Qme..)) is given by Theorem |2| Using this, 

we can find a lower bound expression of decay rate function /* in the next section. 

C. Asymptotic Analysis and Comparisons with the CSI-only User Scheduling 

In this section, we shall derive some asymptotic results for /* to get a better insight in the behavior 
of the worst case delay. As a comparison, the delay performance under a CSI-only baseline policy 
will also be derived. 

The CSI-only baseline algorithm assumes that each user k feeds back the SINR for the i*(A;,n)-th 
beam on each antenna n, where i*{k,n) = argmaxi<j<A/ SINR^ Then for each beam i, the BS 
schedules the user who has the highest SINR on beam i. The CSI-only baseline scheme corresponds 
to a special case of the proposed two timescale user scheduling by setting Xfc = 1 for all k in stage 
I. 

In order to obtain more design insights from the results in Theorem 4, we shall consider a special 
case where the arrivals follow Poisson distributions. Specifically, the MGF of A^ is given by 
^A,fc(^) = e^'^'^""^). We have the following results. 

Corollary 1 (Decay rate for the CSI-only algorithm): Let fib = ^°s(-^^g ^nd At = AX. For 
Poisson arrivals with A < pb, the large deviation decay rate of Qmax(oo) under the CSI-only baseline 
algorithm can be expressed as 

Mlog(PlogjVK) 

4aseline = log • (28) 

■ 

Proof: Please refer to Appendix [H] for the proof. ■ 
Remark 2 (Interpretation of the results): We can observe that, under a fixed total arrival rate At, 
the CSI-only baseline algorithm has a decay rate /* = ©(log log log K) due to the standard multi-user 
diversity gain. 

Similarly, we obtain the following results for the large deviation decay rate of Qmax(oo) under the 
proposed two timescale user scheduling algorithm. 

Corollary 2 (Decay rate for the proposed algorithm): Let fiQ = iiiixe[o,i] /^p(^) = ^K, 

, , Mlog(PlogAr5*(a;)) , „ . ■ , ■ , , , , , ■ ■ , 

where /ip(xj = lS'{x) Under Poisson arrivals with A < /io> the large deviation decay 

rate of (5max(oo) under the two timescale user scheduling algorithm can be expressed as 

M 

i;rop > (1 - es) log + log — + e, log Ro + C (29) 
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where C = {log [iV log {PW (^))] - W (^) } dx, Rq = log (1 + Px) dF{x), and e > 
is a small constant. ■ 
Proof: Please refer to Appendix |l] for the proof. ■ 
Based on the results in Corollary [T] and |2] we conclude the following for the CSI-only user scheduling 
algorithm and the proposed two timescale algorithm. 

• Large deviation decay rates I^^^^ » ^baseline' when the number of users K grows large. This 
demonstrates that it is important to utilize the queue information in the user scheduling algorithm 
to minimize the worst case delay. 

• In addition, both of the schemes benefit from the multiuser diversity. The decay rate increases 
when the number of users increases, and the rate /prop increases faster than the baseline. 

• Furthermore, both of the schemes benefit from the MU-MIMO channel. It is demonstrated that, 
when increasing the number of data streams M and the receive antennas N, the large deviation 
decay rates /prop and /baseline ^^^^ increase as ©(log A/ log log A^). 

In summary, by carefully exploiting the queue information in the stage I feedback filtering, the 
proposed MU-MIMO algorithm has significant delay performance gain compared with conventional 
CSI-only schemes. 

V. Numerical Results 

In this section, we simulation the queueing delay performance of the proposed two timescale user 
scheduling algorithm. We consider a MU-MIMO system with K users, and packets arrive to the 
queue of each user according to a Poisson distribution with rate \ = Xt/K, where the total arrival 
rate is At = 7500 packets/second. Each packet has L = 8000 bits. The system bandwidth is 10 
MHz and the SNR is 10 dB. The number of transmit and receive antennas are M = 4 and N = 2, 
respectively. The scheduling time slot is r = 1 ms and the simulation is run over Ttot = 100 seconds. 
We compare the performance of proposed algorithm against the following reference baselines. 

• Baseline 1: CSI-only user scheduling (CSIO) |[6|. At each time slot, all the users feedback 
the CSI to the BS, and the BS schedules a set of users who respectively have the highest SINR 



on each beam (see Section IV-C I. 



Baseline 2: CSI-only user scheduling with limited feedback (CSIO-LF) fE\. The scheme is 
similar to baseline 1 except that the user feeds back to the BS only when its SINR exceeds a 
threshold tsiNR = 2 dB. 

Baseline 3: Proportional fair user scheduling (PFS) fV\. At each time slot, all the users 
feedback the CSI to the BS, and the BS transmits data to the users using proportional fair 
scheduling with window size t^, = 100 ms. 
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• Baseline 4: Max weighted queue user scheduling (MWQ) p3| . At each time slot, all the 
users feedback their CSI to the BS, and the BS selects a set of users so that the instantaneous 
queue- weighted sum rate ^ QkRk is maximized. 
Note that the associated user scheduling problem in baseline 4 has much higher complexity for user 
scheduling and feedback from all the users are required. Hence, baseline 4 serves for performance 
benchmarking purpose only. 

A. Queueing Performance and Feedback Comparisons 

Fig. [3] shows the overflow probability for the worst case queue Pr((5max(c«) > B) versus the 
buffer size B. The number of users is, K = 40. The feedback policy x updates on every T = 1, 5, 10 
time slots. The proposed scheme significantly outperforms over baselines 1 - 3. It also has similar 
performance as baseline 4. Fig. [4] demonstrates the average feedback amount S (defined as the average 
number of users feedback to the BS at each time slot) versus the number of users K. The feedback 
amount of the proposed scheme is less than those of all the baselines. Note that although baseline 4 
has a smaller worst case queue, it requires all the users feedback to the BS. 

B. Large Deviation Decay Rate for a Large Number of Users 

Fig. |5] the large deviation decay rate over the number of users. The decay rates /* in (22 1 are 



evaluated at buffer size i?o.05> where overflow probability Pr ((5max(oo) > i^o.os) = 0.05. The decay 
rate for the proposed scheme grows much faster than those of baselines 1-3 with the number of 
users K. This is consistent with the result in Corollary [2] 

VI. Conclusions 

In this paper, we proposed a novel two timescale delay-aware user scheduling algorithm for the 
MU-MIMO system. The policy consists of a queue-aware mobile-driven feedback filtering stage and a 
dynamic SINR-based user scheduling stage. The queue-aware feedback filtering control algorithm in 
stage I was derived through solving an optimization problem. Under the proposed two timescale user 
scheduling algorithm, we also evaluated the queueing delay performance for the worst case user using 
the sample path large deviation analysis. The large deviation decay rate for the proposed algorithm, 
scaled as O (logK), was shown to be much larger than a CSI-only user scheduling algorithm, which 
means that the proposed scheme performs better in reducing the worst case delay. The numerical 
results demonstrated a significant performances gain over the CSI-only algorithm and a huge feedback 
reduction over the MWQ algorithm. 



17 




Queue length B (packets) 



Figure 3. The overflow probability for the worst case queue Pr(Qniax(oo) > B) versus the buffer size B. The number 
of users is K — 40. The feedback policy x stage I updates on every T — 1,5, 10 time slots. The proposed scheme 
significantly outperforms over baselines 1 - 3. It also performs closely to baseline 4. 



Appendix A 

Throughput Optimal Property of the FFCA Policy 



In this section, we prove that the feedback control policy (10 1 can achieve the maximum stability 
region. 

Consider the queue dynamic in ([3]). By squaring the equation on both sides and using the property 

[max{0, x}]^ < x^, we obtain \/k, 

Ql{t + 1) < Ql{t) + ^il{t) - 2Qk{t){fXk{t) - Ak{t)) + Al{t) (30) 

where we simplify the notation by writing fJ-k{Q{t),T~(-it)) as ^fc(t). Following the definition of 
conditional Lyapunov drift AL(Q(t)) in taking conditional expectations and summing over all 
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Figure 4. The average feedback amount S versus the number of users K. The feedback threshold of baseline 2 is 
tsiNR ~ 2 dB. The feedback amount of the proposed scheme is much less than those of all the baselines. Note that 
although baseline 4 (MWQ) has a smaller worst case queue, it requires all the users feedback to the BS. 



k inequalities in ( [30] ) yields 

AL(Q(t)) < E Y.^,l{t) + Al{t)m) 

. k 

-2 J]Qfe(t)E [iik{t) - Au{tm{t)] 

k 

Denote positive constants /I^ax ^'^'^ '^max ^^^^ that 

E Hmm] < /^Lx and E [Al{t)\Q{t)] < A^x- 



Let B = /i^ax + "^max- ^hc drift (31 1 is bounded by 



(31) 



AL(Q(t)) <BK-2Y^ Qk{t) {E K(t)|Q(t)] - 



(32) 



Suppose now that the arrival A = (Ai, . . . , \k) is strictly interior to the stability region C (Definition 
|2]) such that A + el € C, for e > 0. Since channel states are i.i.d. over time slots, using the result 
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Figure 5. The large deviation decay rate over the number of users. The decay rates /* in \22\ are evaluated at buffer size 
Bo. 05, where overflow probability Pr(Qmax(oo) > Bo. 05) = 0.05. The decay rate for the proposed scheme grows much 
faster than that of baselines 1-3 with the number of users K. Note that although baseline 4 performs the best, it requires 
all the users feedback to the BS. 



in p8| Corollary 1], it follows that there exists a stationary randomized feedback control policy that 
schedules user to feedback independent of queue Q(t) and yields 

EK(t)|Q(t)] =E[i?fc(t)] > Afc + e, Vfc 

E[5(Q(t)|Q(t)] = 5(e). 

Because the stationary policy is simply a particular feedback policy and note that the EECA maximizes 



the term '^i^E[Qk{t)Rk{t)], the right hand side of (32i under FFCA is less than or equal to the 
resulting value under the stationary policy. Therefore, we have 

AL(Q(t)) + VSiQ{t)\Qit)) <BK- 2eY,Qk{t) + VS{e). 

k 

Notice that 5(e) < K, which is the maximum feedback cost by the definition of the cost function in 
Q. Using the results in Lemma [l] it follows that J2k Qk{i) < ^^^^^^^''^ < BK±pi ^ ^j^-^j^ 
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proves that the FFCA policy stabilizes all the queues. 

Appendix B 

Data Rate for the User with Maximum SINR 

From the effective SINR expression in as (pi are unitary vectors, ^£ i-i-d- over 

with chi-square distribution with degrees of freedom 2. Consequently, the term ^j-^^^'j 
is chi-square distributed with degrees of freedom 2M — 2. Thus, the PDF f{x) and CDF F{x) of 
SINR), „ are given by ||2j 

—x/P / 1 

/(^)= h^(l + x)+M-l 



(1 + 
and 



1 



log(l+ max SINRl^J 

ki=j^,l<n<N 



oo 



(1 + ' 

respectively. Thus, for a particular user A; G J^, as SINR^. „ are i.i.d. over different users k and 
antennas n, the probability that user k has the largest SINR on the i-th beam and the n-th antenna 
is give by 1/NS. The corresponding CDF of the maximum SINR is 

P( max SINRI „ < x) = Uk „Pr (SINRI „ < x) (33) 

= {F{x)f'' (34) 

and hence, the data rate can be given by 

72 = Eh 

) 

\og{l + x)d{F{x))^^ 

POO 

= / log{l + x)NSf{x)F{x)^^-^dx. 
Jo 

As each user equips with antennas, the average data rate for user k ^ F, given = 5 is 

N M . ^ 

^{S) < y y Pr ( SINRI „ = max SINRI „ ) R 

n=l 1=1 ^ ' 

/•OO 

= M log{l + x)Nf{x)F{x)^^-^dx 
Jo 

where this is an upper bound since there is a small probability that a user has maximum SINRs for 
more than 2 beams on one antenna. As the feedback policy only allows the user to pick up one beam 
for each antenna to feedback, hence decreases the throughput. 
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However, the probability that a user has maximum SINRs for more than 2 beams on one antenna 
is very small, which can be shown in the following lemma. 

Lemma 6: Given max^^jr x<n<Ar SINR^ „ > 1, Vi = 1 . . . M, it is impossible for a user to have 
maximum SINRs for more than two beams on one antenna, i.e., for {k*,n*) = arg maxfcgjr x<„<Ar SINR^ „, 
we have SINR*^. „. = maxi<j<A,/ SINR^..^., \fi. ■ 
Proof: Given SINR^. = maxjtgjr i<„<jv SINR^ „ > 1, we have 

i< > 1/p + E i^i-*^^/ ^ 1^1" . 

Therefore, 

SINRi.„. = < < 1 

which shows that SINR^. is the maximum for user k* on antenna n* over all the M beams. ■ 
In the following, we evaluate the gap Arj = r]{S) — ri{S). Consider the maximum SINRs for 
beam i and j are on antenna n* of user k* . Assume SINR^. ^, > SINR;[. ^, . Thus we must have 
SINR^„ ^, < 1 from the above lemma. Therefore, the lost of not reporting SINR;[. ^» due to the stage 
I policy is bounded by 

Aij < MPr ( max SINR1,„ < l) . log(l + SINR^,. „.) 

= MPr(SINR^,„<)'^^log2 

1 2M-1 J ■ 

Appendix C 
Poof of Theorem[2] 

According to Lemma |2j the objective in ( 11 1 can be written as /(p) = X] Qki't)'^ [XkV{s)]' where 
p = {pi, . . . ,pk} determines the distributions of Xfc and s. We attempt this problem by considering 
the following two cases. 

Case 1: We consider pk = I for some k £ ICi C IC, with |/Ci| = 5i < S*, and pk < I for some 
G /Co = where /C is the set for all users. We thus apply Poisson approximation to determine 

the distribution of s, where s = Si + s' and s' satisfies Poisson distribution with mean u = S — Si, 

and thus, we have J2kefC\Ki Pk = ^• 

Using the Poisson approximation for the distribution of s decouples Xk and s. Hence we have 
bckvi^)] ~ Pkr]{s), where r/(s) = E [r]{s)] is independent of pk- Therefore, the inner subproblem 

becomes a linear programming problem as 



max J2k=iPkQkr]is) 

{Pk} 

subject to constraints (12i — ( [T3| ) 
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where the solution is given by p-,r{k) = 1; 1 < ^ < [S\, P-n{ka) = S — yS\, ko = [S\ + 1, and 
Pn{k) = 0' Otherwise. Here, the permutation 11 = {iT{k)} is such that > • • • > Qt^[k)- However, 

the solution violates the assumption of pk < 1 for some k ^ Ki. This leads to the second case as 
follows. 

Case 2: We consider p^ = 1 for some G /Ci C /C, with |/Ci| = [S"] = 5*1, and pj^ = S — [S*] for 
some k = kQ. The inner subproblem then becomes 

max T^k^ko PkQkViSi) (1 - Pko) 

{Pk},ko ^ 



+ [J2k^koPkQk + Qko) r]{Si + l)pk„ 

subject to pk = {0, 1}, VA; / k^ 

Pk = S-Si, k = ko 

which is a combinatorial problem and the solution is simply given by p.,^(^k) = 1> 1 < ^ < [S\, 
Pniko) = S-[S\, ko= [S\ + 1, and p^(^k) = 0, otherwise. 

Combining the discussions on Case 1 and Case 2, we obtain the results for {pk}- In addition, using 
such results, we also obtain i?^ = E [xkVi^)] m Theorem 



Appendix D 
Proof of Lemma[3] 

Taking derivative of the objective function of U{S), we obtain 

[Si [Si+l 

TKS) = - 

dS 



u{S) = -Y,Q 

k=l k=l 



It is observed that, given any integer 5o, the gradient jgU{S) remains constant for any S G {Sq, So + 
1). If jgU{S) = 0, we can consider or S'o + 1 to be the local maximum. If J^U{S) ^ 0, using the 



optimality condition |21 1, S G (So, S'o + 1) cannot be the maximum. It concludes that, the maximum 
should be an integer. 

Appendix E 
Proof of Theorem [3] 

In the following, we use notations s and S interchangeably. Define a continuous function r]c{s) = 
r]{s). UnUke r/(s), we allow rjc{s) to be defined on all positive real numbers. Let I denote the space 
of positively increasing concave functions, i.e., 

X = 1^ G C^iO, +oo) : (/> > 0, > 0, f < 0| . 

Given g Gl, define G{s) = g{s)ric{s) — Vs. We have the following result. 
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Lemma 7: A sufficient condition for the problem ( [20) ) having a unique local maximum in s = 
{!,..., K} is that there is a unique local supremum of G{s) in s € (0, K) for any g G X. ■ 
Proof: Note that, since {Qj^i^k)} on decreasing order, gQ{S) = X]fc=i Q-K{k) is increasing, but 
the increment gQ{S + 1) — gQ{S) is decreasing. Therefore, given any {Q7r(fe)}' there exists a function 
g ^ I, such that = 5q(s), for s = {1, . . . , ii'}. As such, the function G{s) for s G (0, -R') is an 
interpolation of the objective function U{s), where G{s) = U{s) for s = {1, . . . ,K}. 

Therefore, it is straight- forward that if G{s) has a unique local supremum for s G (0, K), so does 
U{S) for s = {!,..., K}. m 

To show G(s) has a unique local supremum, it suffices to show G{s) is concave, which is equivalent 
to show that 

G"is) = g"is)r]c{s) + 2g{s)r]'^{s)+gis)ij';^is) < 0. 



From the property of g G X, we have g {s)s < g{s). Thus 

(35) 



G"{s) < g"{s)v,{s) + ^ 2rj'M + ST^^is) 



The first term is negative by the definition of g G X. In the second term, is positive. Now, let 



r(s) = 2r/^(s) + srj^{s). Note that, from (15 1, r]c{s) is twice differentiable on s G (0,+oo), and we 
have the following two equations 

roc 

rj'^is) = M log(l + x)N'^f{x) log[F{x)]F{x)^^-^dx, 
Jo 

and 

/•oo 

{s) = M log(l + x)N^f{x) log [F{x)f F{x)^^-^dx. 
Jo 

Note that, for N = I, T{s; iV = 1) as s oo. It is also easy to verify that, T{s; iV = 1) < 
for all s > 0. For > 1, let t = Ns. From the above two equations, we have r(s; A^) = A^^r(t; A^ = 



1) < 0. With r(s) < 0, we have G (s) < in (35 1. Hence G{s) is concave and has a global supremum 



in {0,K). Using Lemma |7} we have proven the result. 

Appendix F 
Proof of Lemma[5] 

Consider an upper bound ordered queue length profile as follows, 

j - 1 , 

^77(1) ~ Qmax; Q-K{j) — Qmax(l ~ <^ , 

where (5 > is chosen such that QttQ) < Q-n{j) for all j = {1, ... , K}. 



Note that in the outer subproblem (20 1, the term is increasing 'Yl,k=i Q-n{k) with S while the term 



V-K{k){S) is decreasing. Hence using the upper bound queue length Qt^^^) yields an upper bound 



solution S* to the outer subproblem (20 1 
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We first solve the outer subproblem (20) using the upper bound queue lengths (57r(fc) approximated 



above and the approximated average data rate Rk = PkVkiS) from Lemma |4j The problem now 
becomes 



Qn 



max q(S) 
s ^ 2K 



i2K + 6- 6S)M log (P log NS) - VS. 



(36) 



It can be shown that g{S) is concave. Taking derivative of g{S), we obtain, 



9{S) 

Setting g {S) = 0, we have 
S log NS = 
Therefore, we have 



2K 
, MQ 



Mlog {P log NS) 

{2K + 6- 6S) 



2K 



V 



+ 



2K 



log {P log NS) + 



SlogNS 



1 



V. 



1 



log NS SlogNS 



N SlogNS < 



V 



N 



MNQr_ 



ci 



0, we have 



,MQmax/ V 

for 5 > 3 and all 5 > 0. Thus we can just take S* < j^e^'^'^'h Note that, under 5 

— ( log (P log NK) + ^ ^- 1 ^ 0, 

2iv: V ' logNK SlogNS J 

which means the upper bound is achieved when (57r(fc) ~ Qmax- 

As a result, we have S* < S* < jfC^'^^'l 

Appendix G 
Large deviation with sample path analysis 

In this section, we give a brief introduction to the sample path large deviation analysis and derive 
the proof of Theorem |4] 

A. The Large Deviation Principle 

Consider the scaled sample path (7max(*) — jQmax (L^^J)' where the jumps can be given b}|^ 

^ lBt\ ^ lBt\ 

9max(*) - 9max(s) = ^ ^m(r) (^) ' ^ ^m(r)(-r) 

r=[BsJ [Bs] 

for < s < t < Ts, where m{T) = arg max Qk{T) denotes the index of the maximum queue at time r. 
Note that (at least when \ t — s\ is small), the jump (Zmax(*) ~9max('S) is a sum of independent variables 

^Here, for easy discussion, we assume the identity gmax(''"+ i; ) ^ 9max(''") = J;^m(T) — jjDm(T) holds on the boundary, 
where the maximum queue index changes, i.e., m(T) 7^ m(r + j^). Note that, with the fluid approximation, such boundary 
effect (which violates the above equality) vanishes in the scaled sample path g^ax when B becomes large (and hence the 
jumps becomes smaller). 
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v{t) = ^m(r) ~ Dm(T)- Consider q^.^^{t) follows a continuous sample path w{t), as q^i^^{t) w 'w{t). 
We write the random jump as v{w{t)), since it depends on the state w{t). 
Define the function Iq = 

infj^ l{w{T),w'{T))dT:w{0) = 0,w{Ts) = l,T,>0^ (37) 

where 



l{x = w{T),y = w (r)) = sup {Oy - g{x, 9)} (38) 

e 

is the local rate function, and 

is the logarithm moment generating function (LMF) of the random variable v{x). 

We consider the escape time tb = inf |t > : qj^axi^) > l}' when the process goes overflow 
following. Therefore, the following theorem characterizes the large deviation principle for tb- 

Theorem 5: (Large deviation principle for tb) Suppose logv{x) is Lipschitz continuous on [0, 1]. 
We have the following, 

(i) For each e > 0, lim„^oo Pr G (Iq - e, /q + e)) = 1, 

(ii) liniB^^oo ^ ^ogE [tb] = Iq, 

where Iq = inf {/"' : w{0) = 0,w{Ts) = l,Ts> 0}. ■ 
Proof: Please refer to [15, Theorem 6.17] for the proof. ■ 
Note that the escape time tb that the process (?max(0 takes to enter the set {gmax(^) > 1} implies 
the steady state probability for gmax(*) to be in the set {(?max(0 > 1}' i-^-' li™-B->oo ^ log IE [tb] = 
limB^oo —-g logPr (gmax(c«) > l)- The above theorem guarantees that /* = Iq, and we can find the 



large deviation decay rate by solving 37 In the next section, we shall find the LMF g{x, 9) to solve 



B. The Rate Function 

We first characterize g{x,9). Note that the random variables v{t) is a composition of the arrival 
and departure -Dm(t)> which are independent. Therefore, we have g{x, 9) = logE [e^^^^^^'^^^^^'^^] = 
logE [e^^] + logE [e""'^'^^)] . We find the LMF of the random variable D{x) using the result of the 
following lemma. 

Lemma 8: Suppose there are S users feedback to the BS. Then the data rate for user k ^ F satisfies 
Rk = Ir, where r — log (P log A^S") almost surely (a.s.), and I ^ ^ ^ Poiss (/)) in distribution, as 
p = ^ and S oo. ■ 
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Proof: Let r{S) be the data rate for the i-th beam under 5 users feedback to the BS. The results 
in Q give that E [r{S)] / log {P log NS) — 1, as S" — oo. Also, due to the channel hardening effect 



|22|, the variance Var [r{S)] — 0, as — oo. This shows that r{S) — log (Plog A^5), a.s. 

According to the scheduling policy, each user k ^ F may be allocated with Z = 0, . . . , min {M, N} 
beams, and the data rate can be written as Rj^ = lr{S). Since SINR|. ^ are i.i.d. over k and n, the 
probability that a user being assigned I beams approximately follows a binomial distribution B{M,p), 
where P = 

It is well-known that B{M,p) — Poiss(p), as M — oo and Mp — p. Therefore, the result is 
proven. ■ 

Using the above result, the package departure D{x) = Rk/L w ^/xq for k ^ F, where /xq = 



r{S{x))/L and ^ ~ Poiss (^). We have 



gD{x,e) = logE 



Mo log IE 



Therefore, the LMF of v{x) is given by g{x, 9) = gA{0)+p.{x) (e ^ — l), where p{x) = p.q{S{x))^^. 
We make use of the following lemma to prove the final result. 



Lemma 9: Assume that l{x,y) in (38 1 is differentiable in y at all x, which is nondegenerate in 
[0, 1]. For each x, the equation g{x,9) has at most two solutions. Then with the appropriate choice 
of 9*{w), we have 

I* = f e*{x)dx. 
Jo 

■ 

Proof: Please refer to [19, Lemma C.9] for the proof. ■ 
Using the result of the above Lemma, we prove the theorem. 

Appendix H 
Proof of Corollary^ 

Under the CSI-only algorithm, the average data rate for each user k is given by i?fc,baseiine ~ 
M iog(P^og NK) ^ ^YiQte the numerator is from Lemma |4] given all the K users feedback to the BS, 
and the denominator is due to the fact that on average all the users have equal probability to get 
scheduled. Therefore, the packet departure rate of the maximum queue Qma.x{i) is given by = 

i?;..basolino _ Mlog(PlogjVg) _ 

L ^ KL ~ i^b- 

Under the Poisson arrivals, we have gA{x,9) = logA/i(0) = A(e^ — 1). Thus solving g{x,9) = 
A(e^ — 1) + (e^ — l) = 0, we obtain = 1, or /i(x)/A. It is easy to verify that = 1 only 
leads to Iq = 0, which is not in our interest. On the other hand, using = fi{x)/X, we obtain the 
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large deviation decay rate function from Theorem |4] as 







log 



^ M\og{P\ogNK) IKL 



M log (P log NK) 



XtL 



Appendix I 
Proof of Corollary [2] 

Using the result in Lemma |4j the average packet departure rate is given by 

Afl0g(Pl0giV5*(Qn,ax)) 

Note that such approximation may be loose when Qmax is small. Instead, we use the following 
augmented approximation, 



where Rq = Jq°° \og{l + Px)dF (x) . Note that i?o is the average link capacity of a SISO channel and 
MRq is the average capacity under the random beamforming. Hence is the average package 
departure rate for the maximum queue process Qmax{t) under a round-robin scheduling policy, which 
performs as a lower bound for the proposed user scheduling algorithm. 

Note that fip{x) is monotonically increasing. Define ex as the solution to fj,p{x) = ^j^, and 
e = inf {ek '■ K > Kq} for some Kq < oo. Using Theorem |4j we have 

r > f'log'^dx 







1 / ^ (-Mlog (PlogiV5*(x; 

log I - — — - max 







>^t/K { LS*{x) 

MRo \ 
LK J 



dx 



log + / log Rfydx 







^1 \og[P\ogNS*{x)] K 
S*{x) 



+ j log ^ dx 



M 

log ^ + e log i?o + ( 1 - e) log K 

. log (Flog iV^exp(Ty(j^))) 

M 

log ^ + elogiJo + (1 - e)logK + C 



28 



where C = (log [TV log {PW (^))] - W (^) } dx. 

The first inequaUty is because JtpiQmax) is a lower bound approximation of the packet departure 
rate /^^(Qmax) « "77- Thus the corollary is proven. 
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